MPEG-4 Streaming System With Adaptive Error Concealment

ABSTRACT

An MPEG-4 system with error concealment is provided for video service under the network with packet loss. The MPEG-4 system includes an encoder and a decoder. The encoder uses an intra-refreshment technique is used to make coded bitstream more robust against noise in order to stop error propagation. The rate-distortion optimization criterion is also introduced to adaptively update in synchronization with intra-coded blocks adaptively based on the true network condition with minimal overhead. The Lagrange multiplier is modified to achieve the best rate-distortion balance. In addition, a decoder loop is used in the encoder and is synchronized with the true decoder to achieve the best performance and avoid mismatch with the decoder used in the MPEG-4 system. The decoder is able to achieve resilient decoding from any kind of noise and enhance the reconstructed image quality with spatial and temporal hybrid concealment method. The result shows that a 3.65-9.71 dB further improvement on peak-signal-to-noise-ratio (PSNR) can be achieved in comparison with the existing methods that adopt spatial copy and zero motion concealment in decoding.

CROSS-REFERENCES TO RELATED APPLICATIONS

This is a division of U.S. application Ser. No. 10/990,818, filed Nov. 16, 2004, which is incorporated herewith by reference.

FIELD OF THE INVENTION

The present invention generally relates to an MPEG-4 streaming system, and more specifically to an MPEG-4 streaming system with adaptive error concealment scheme to improve the overall quality of the transmitted video contents over error prone environment.

BACKGROUND OF THE INVENTION

It has been a constant challenge for the research community and the industry to search for a better service quality for video streaming over the error-prone environment such as Internet, as the video bitstreams may be corrupted by random error or suffer packet loss in the channels.

To address the aforementioned problem, the MPEG-4 video coding standard is developed to provide users a new level of performance for various video communication services, such as video-on-demand (VOD) over the Internet or mobile multimedia applications. An MPEG-4 video system uses a robust encoded bitstream and a resilient decoding process. The robust encoded bitstream is used in the encoder to help, with some coding overhead, the recovery from error corruption. One of the methods for creating a robust bitstream is to insert additional intra blocks to stop error propagation in decoder. But the insertion of intra blocks will slightly decrease coding efficiency. Thus, the trade-off of the error propagation and coding efficiency must be built to achieve a good performance for MPEG-4 video encoders.

Cote, Shirani and Kossentini proposed an adaptive intra refreshment (IR) scheme for H.263 under the consideration of rate distortion optimization (IEEE Journal on Selected Areas in Communications, vol. 18, pp. 952-965, No. 6, 2002). The rate distortion optimization is to improve the timing of intra block insertion to achieve the optimized usage of IR based on the Internet conditions.

Another method is to use an error resilient decoding process, which can locate errors and then conceal the lost slices. The error location methods utilize useful header information available at the decoder for coding process resynchronization. For error resilience, MPEG-4 provides several tools, including the resynchronization marker (RM), the data partition (DP), and the reverse variable length coding (RVLC). The optimal usage of the error resilient tools is not specified in the video specification. To further enhance the error-resilient ability, the selection of the optimal parameters, intra refreshment, advanced error detection and concealment methods are required to improve the reconstructed video quality.

Several error concealment methods are developed for either spatial error concealment (SEC) or temporal error concealment (TEC). The SEC techniques exploit the spatial redundancy within a picture, while the TEC techniques exploit the temporal similarity of frames in a sequence. For spatial error concealment, various interpolation methods, such as multi-directional interpolation (Valente, et al., IEEE Transaction On Consumer Electronics, vol. 147, No. 3, 2001), and quadri-linear interpolation (Kwok, et. al., IEEE Transaction On Consumer Electronics, vol. 39, No. 3, 1993), are developed in addition to the widely used bi-linear interpolation (Kaiser, et. al., Signal Processing: Image Communication, vol. 14, No. 6-8, 1999). The multi-directional interpolation needs all neighboring macro blocks (MB) to correctly decide the edge direction in the lost MB and requires much more computational complexity. The quadri-linear interpolation is an area-based interpolation which takes the nearest four pixels to interpolate the recovered pixel. Two refinements are introduced by Kwok et. al. One is to increase the weight of nearer direction and the other is to take average of nearest pixels and their neighboring two pixels instead of nearest pixels only. The refinements will make the visual quality smoother.

For temporal error concealment, blind selection of motion vector such as mean, medium, nearest motion vector of surrounding motion vectors have been used. Boundary matching algorithm (BMA) is the most common method that uses the boundary properties to choose a best motion vector. There are two kinds of BMA. One is using boundary gradient to choose a result which makes the boundary match between lost MB and its neighbors. This method can be called a spatial BMA because it uses the spatial boundary correlation. The other BMA method is using boundary difference between the current frame and the previous frame. This method can be called a temporal BMA because it uses the temporal boundary correlation. Other temporal concealment method, such as decoder motion vector estimation (DMVE), uses search range and surrounding area to find a best motion vector according to temporal BMA or uses search range to refine the best motion vector of neighbors. It is obvious that the DMVE costs much more computational complexity due to testing more motion vectors and surrounding lines used for motion estimation.

As spatial concealment is suitable for the area in which spatial correlation is higher than temporal correlation, and temporal concealment is suitable for the area in which temporal correlation is higher than spatial correlation, several hybrid error concealment methods are developed to take advantages of their respective strength. A general hybrid scheme is that spatial concealment is used for I-VOP and temporal concealment is used for P-VOP. Further refinement strategies are also developed to improve the performance of the hybrid concealment methods. For example, the majority of I-VOPs excluding the first VOP have temporal correlation; thus, the temporal methods are used to conceal the VOP. For pictures having conditions, such as scene change, fad in, or fad out, and less temporal correlation, the spatial methods are used to conceal the VOP. The approach proposed by Kraiser et. al. uses spatial activity and temporal activity to decide the use of spatial concealment or temporal concealment. Spatial activity is calculated by computing the variance of nearest neighboring macro-block. Temporal activity is calculated by computing the mean square error between co-located macro-blocks. When the temporal activity is larger than spatial activity, spatial concealment is used, and vice versa. Other approaches use the boundary smoothness property. The ratio of boundary gradient of lost macro-block to boundary gradient of above and below macro-blocks is used to decide if the boundary gradient of lost macro-block is too large and requires the use of spatial concealment instead of temporal method.

However, as more and more applications and activities are brought to the Internet, the competition for bandwidth and the fluctuation of the bandwidth availability is more severe than before. It is, therefore, necessary to device an MPEG-4 streaming system with adaptive error concealment capability in order to deliver performance to the video services.

SUMMARY OF THE INVENTION

The present invention has been made to overcome the aforementioned drawback of conventional techniques used in MPEG-4 delivery in an error-prone environment. The primary object of the present invention is to provide an MPEG-4 system with error concealment for video service under the network with packet loss.

The second object of the present invention is to provide an encoder for use in an MPEG-4 video streaming system. The encoder uses an intra-refreshment technique is used to make coded bitstream more robust against noise in order to stop error propagation. The rate-distortion optimization criterion is also introduced to adaptively update in synchronization with intra-coded blocks adaptively based on the true network condition with minimal overhead. The Lagrange multiplier is modified to achieve the best rate distortion balance. In addition, a decoder loop is used in the encoder and is synchronized with the true decoder to achieve the best performance and avoid mismatch with the decoder used in the MPEG-4 system.

The third object of the present invention is to provide a decoder which is able to achieve resilient decoding from any kind of noise and enhance the reconstructed image quality with spatial and temporal hybrid concealment method. The result shows that a 3.65-9.71 dB further improvement on peak-signal-to-noise-ratio (PSNR) can be achieved in comparison with the existing methods that adopt spatial copy and zero motion concealment in decoding.

The fourth object of the present invention is to provide a rate distortion optimized intra-refresh (RDIR) method for improving the bit-stream structure according to the network condition to an encoder system with least overhead.

The fifth object of the present invention is to provide an error concealment method combining hybrid concealment scheme and block-based refinement.

The foregoing and other objects, features, aspects and advantages of the present invention will become better understood from a careful reading of a detailed description provided herein below with appropriate reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention can be understood in more detail by reading the subsequent detailed description in conjunction with the examples and references made to the accompanying drawings, wherein:

FIG. 1 shows an MPEG-4 system with error concealment according to the invention;

FIG. 2 shows an embodiment of an encoder according to the invention;

FIG. 3 shows an embodiment of a decoder according to the invention;

FIG. 4 shows an RDIR encoding flowchart used in an embodiment of the invention;

FIG. 5 shows a schematic view of bi-directional error concealment used in the embodiment of the invention;

FIG. 6 shows three different concealment orders;

FIG. 7 shows a flowchart of an embodiment of error concealment of the invention 1; and

FIG. 8 shows a 3×3 first order smoothing filter used in an embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a schematic view of an MPEG-4 system of the present invention, including an encoder 102 and a decoder 104. The details of encoder 102 and decoder 104 are illustrated in FIG. 2 and FIG. 3, respectively.

As shown in FIG. 2, an encoder includes an intra-coding module 202, an inter-coding module 204, a rate-distortion (R-D) cost decision module 206, a motion search module 208, an MV module 210, a mode module 212, a mode modified module 214, a motion compensation (MC) module 216, a discrete cosine transform (DCT) module 218, a quantization (Q) module 220, an inverse discrete cosine transform (IDCT) module 222, an inverse quantization (IQ) module 224, and a variable length coding (VLC) module 226. For the encoder to generate error resilient bitstreams, an error probability model is built for passing each macro-block (MB) of the bitstream through the model. The distortion of the MB is calculated from the reconstructed images with and without errors and the weighting follows the accumulated error probability. If the R-D cost to encode the current MB as inter-coding mode is lower than that of the intra-coding mode, the inter-coding mode is selected; otherwise, the intra-coding mode is selected. Such a criterion will bring the most efficient usage of intra blocks insertion under similar service quality. After the coding mode is decided, the current MB is encoded and the coded bitstream is passed to a transmitter.

As shown in FIG. 3, the decoder of the present invention includes a decoding VOP header module 302, a decoding VOP module 304, a timing check and correction module 306, an error detection module 308, an error recovery module 310, an error localization module 312, a frame buffer 314, a hybrid scheme module 316, a spatial concealment module 318, a temporal concealment module 320, a smooth filter 322, and an output buffer 324. First, a received bitstream is parsed to look for continuous resynchronization markers (RM). A successful bitstream parsing indicates that no syntactic errors occur, and the normal decoding resumes. If there is any syntactic error, the decoder will jump to the next RM to resume the decoding processes. After one frame is fully reconstructed, the proposed error concealment algorithm is applied based on the available information from the received bits.

To enhance the ability of error resilience, matching solutions over both the encoder and decoder end are provided. At the encoder, the rate distortion optimized intra-refresh (RDIR), originally developed as a more effective solution for error propagation, is provided to improve the bit-stream structure according to the network condition. The intra-refresh technique inserts intra-blocks instead of inter-blocks in P frame to prevent serious error propagation over error-prone network. Since the intra-coding block sacrifices more bits, it will become inefficient when the network condition varies over time. To improve this situation, intra block insertion with R-D optimization adaptive to channel condition can provide the most compact encoder system with least overhead.

The RDIR design flow is shown in FIG. 4. Starting with step 401, the begin of the i-th P frame is read, and for each i-th MB (step 402), the cost for intra and inter blocks, denoted as J_(intra) and J_(inter), can be computed, as shown in step 403, by the following Lagrangian formula:

J=D _(q) +λ·R

where

J: Lagrangian cost

λ: Parameter used to control coding bit rate in encoding process

D_(q): Distortion induced from residue quantization

R: Bits used in coding a macroblock

A better mode for individual MB can be found by taking both distortion and bitrate into consideration. Not only quantization distortion but concealment error must be included for transmission over packet switch network without reliable quality of service (QoS). Therefore, the distortion with concealment combined with packet loss rate is taken into account for RD-cost calculation. After the cost J is decided, the mode with minimal J is chosen as the current MB coding mode, as in step 404. If J_(intra) is greater than J_(inter), the intra-coding mode is chosen, as in step 405; otherwise, the inter-coding mode is used, as in step 406. In step 407, if this is the last MB, the process proceed to process the next P frame as in step 408; otherwise, return to step 402 and continue processing the next MB of the current P frame. For error prone environment, the distortion of D will suffer more serious quality loss. It comes from both the original quantization error and the errors introduced when concealing the lost MB from nearby MB. So the above formula needs to be modified as

J=(D _(q)·(1−p)+D _(c) ·p)+π·R

where

D_(q): Distortion induced from residue quantization

D_(c): Distortion induced from no-so-perfect concealment algorithm

p: Channel packet loss rate

To achieve the R-D optimization under the proposed intra-refresh encoding, the parameter of λ needed to be updated every frame to control the bits used under the same distortion. The updating formula is as follows:

λ_(n+1)=λ_(n)(1+α(ΣR _(i) −n·R _(target))), α=1(20·R _(target))

The parameter of a comes from a variety of experimental trials for buffer control. The packet loss rate is used to model the internet protocol. Using network condition to model the situation at the decoder is expected to reconstruct better image quality. If the modeling is 100% accurate, the same quality as transmitted one in error prone environment can be obtained.

On the other hand, resynchronization markers (RM) are enabled to stop the collapse of decoder to handle the packet loss. If the addresses of MBs are discontinued, the decoder will skip to the next resynchronization marker and restart decoding. Since the remaining parts from the error starting point to next RM will be dropped due to the uncertainty of the content, the length between RMs may have great influence over the reconstruction quality. If the length is long enough to be able to contain several blocks of information, it will suffer serious quality information loss with packet loss. However, if the length is too short, the redundant information will be distributed in the bit-stream and make the encoding inefficient. The tradeoff is chosen according to the application domain. Considering the application of VOD application under the bit-rate of above 256k bits per second (bps), the 1000 bits as the length of each video packet is a suitable selection.

A robust streaming system needs to have an error resilient decoding process and a good error concealment method. Error resilient process is to prevent the decoding process from crash. Error concealment method helps to improve the image quality corrupted by the transmission error. As shown in FIG. 3, decoding VOP header module 302 and decoding VOP module 304, which are at the middle part of FIG. 3, constitute an original decoder. The upper part of FIG. 3, including error detection module 308, error recovery module 310 and error localization module 312, constitutes the error resilience functional units. Timing check and correction module 306 is also added to handle the VOP header loss. The bottom part of FIG. 3, including frame buffer 314, hybrid scheme module 316, spatial concealment module 318, temporal concealment module 320, smooth filter 322, and an output buffer 324, constitutes the error concealment functional units. The inclusion of error resilience functional units and error concealment functional units can realize a robust decoding system.

Error concealment uses the localizations of lost MBs and neighboring relevant data of lost MBs to conceal the corrupted VOP. To achieve good concealment results requires a simple and high performance method and using relevant data as much as possible. Because error concealment is an additional process to the original decoding process, the extra computational complexity will slow down the decoding rate. The bi-linear interpolation is chosen for spatial concealment and temporal BMA for temporal concealment due to their middle computational complexity and high performance. Other interpolation methods can also be used for the same purpose. The hybrid scheme is used to decide when to use spatial or temporal concealment. Because error concealment use relevant data to conceal the lost MBs, using relevant data as much as possible can make concealment method works well. The bi-directional error concealment is used in the present invention, as shown in FIG. 5.

There are three innovations in the error concealment algorithm used in the present invention. The first is using a less complexity hybrid scheme to choose when to use spatial concealment or temporal concealment. The second one is to implement block-based concealment to refine general MB-based method. Finally, a simple smoothing filter is used for improving visual quality.

Based on the previous observations, spatial concealment is suitable for fast motion or low detailed sequences since the correlation across successive frames is smaller than the correlation of pixels within the frame. In other words, temporal concealment is suitable for slow motion or highly detailed sequences. The temporal concealment can avoid visible blocking artifacts introduced by the spatial concealment. Thus, an adaptive temporal/spatial error concealment scheme is present to provide video contents of better picture quality.

Several considerations to select spatial error concealment or temporal error concealment and block-based concealment are included in the adaptive hybrid error concealment method of the present invention.

Reference hybrid concealment methods use certain statistics characteristics such as temporal activity, spatial activity, or boundary similarity to decide to use spatial concealment or temporal concealment. The methods take more extra computational complexity to get the information. For example, if the boundary difference from BMA result is larger than the threshold, spatial concealment is used to conceal the MB which may have less temporal correlation. If the boundary difference from BMA result is smaller than the threshold, the result of temporal concealment is used to conceal the MB.

By observing the motion vectors in the sequence, when the motion vector is large, the correlation between surrounding motion vectors are very low because of fast motion or motion in great confusion. Spatial error concealment is used when detecting large motion vectors. In the fast motion area or scene change, the temporal correlation may become very low and motion vectors will be in great confusion or intra blocks are added. When the intra blocks are more, the surrounding motion vectors are less and insufficient temporal correlation is available for recovering the MB. Spatial error concealment is used to conceal the MB.

Considering the strong correlation of pixels within a small area and fit the 4-MV coding mode used by MPEG-4 Simple Profile, the block-based error concealment adopts an 8×8 block as a processing unit. Based on validation of four surrounding MBs and the location of the current block, each of four 8×8 blocks can be concealed in different orders. For example, according to the validation of the four neighbors, there are 15 conditions of concealment order. FIG. 6 shows three different conditions. The numbers within the central MB indicate the concealment order of a MB. The block-based refinement can apply both spatial and temporal concealment in a single MB.

The error concealment flowchart, combining hybrid concealment scheme and block-based refinement, is shown in FIG. 7. Starting with step 701 with i-th lost MB, the error concealment performs an intra surrounding check in step 702 and a fast motion check in step 703. If the result of the checking is yes, the MB-based and Block-based spatial concealment is used, as shown in step 704. Then, proceed with the next MB. Otherwise, perform a block order in step 705. In step 706, a boundary matching algorithm is computed. In step 707, comparing with the threshold to determine if the threshold has been exceeded. If so, take step 708 to perform motion compensation. Otherwise, set the flag as in step 709, and proceed to use the MB-based and Block-based spatial concealment is used, as shown in step 704. Then start to process the next MB.

To reduce the blocking effect caused by mismatch of temporal concealment result, a smoothing filter is used on the block boundary of lost MB concealed by temporal concealment. For example, a filter used can be a 3×3 first order filter, as shown FIG. 8. This filter have better performance than the de-blocking filter provided by reference software and another 3×3 second order filter. The smoothing filter can also be applied to spatial concealment results. Because the interpolation only uses the nearest four pixels, some unexpected edges are observed. The smoothing filter can make the interpolation smoother. The same filter can be used to make the results of temporal concealment and spatial concealment smoother.

Several simulation runs are carried out using the system of the present invention. For example, the Foreman and Akiyo sequences are used to simulate the performance of the concealment method in fast motion and slow motion. The coding parameters are as follows: encoding frame rate is 30 frames/sec, decoding frame rate is 10 frames/sec, packet size is 2000 bits, GOP structure is I-P-P . . . , bit-rate is 512k for normal test. To off-line simulate packet loss condition and see the effect of packet loss rate and concealment method, the random drop with uniform distribution is used to simulate different packet loss rate. Because different lost places will make different results, the average of ten simulation results are taken to obtain the average performance. Seven different type of video sequences such as Foreman, Akiyo, Mobile, Football, Mother&Daughter, Stefan, and Bus, are experimented for 256 bits/sec (low bit-rate), 768 bits/sec (high bitrate). Packet loss rate are 1%, 5%, 10%, 15%. The results show that the fast motion and low detailed sequences need lower threshold to have more spatial concealment to get better quality, while the slow motion or highly detailed sequences need higher threshold. The present invention achieves 0.3˜0.7 dB improvement on PSNR for visual quality. The results of the simulation indicate that the present invention can achieve better performance when compared to the conventional methods.

In summary, while compared to the prior arts, the present invention offers two innovations. The first is the use of macroblock-based spatial-temporal hybrid error concealment methods instead of frame-based method. This will help to decide whether a spatial concealment or temporal concealment should be used more accurately and more efficiently. The second is to apply fast decision on the switching between spatial and temporal error concealments. The boundary difference between current frame and previous frame is calculated and a threshold is set to decide whether the spatial mode is satisfactory to be applied. Otherwise, temporal mode will be used to replace spatial mode. The threshold is chosen by simulation on various different conditions of bit-rate, packet lost rate, and different sequences.

Although the present invention has been described with reference to the preferred embodiments, it will be understood that the invention is not limited to the details described thereof. Various substitutions and modifications have been suggested in the foregoing description, and others will occur to those of ordinary skill in the art. Therefore, all such substitutions and modifications are intended to be embraced within the scope of the invention as defined in the appended claims. 

1. A rate distortion optimized intra-refresh (RDIR) method for improving a bit-stream structure according to network condition to an encoder system with least overhead, said method comprising the steps of: (1) reading a macro block of a picture frame; (2) computing an intra block cost and an inter block cost for the macro block; (3) comparing said intra block cost with said inter block cost; (4) choosing an intra-coding mode if said intra block cost is higher than said inter block cost; otherwise choosing an inter-coding mode; (5) proceeding to step (6) if the macro block is a last macro block in the picture frame; otherwise, going to step (1); (6) repeating step (1) to step (5) for a next picture frame.
 2. The method as claimed in claim 1, wherein said intra block cost and said inter block cost are computed with Lagrangian formula: J=D _(q) +λ·R where J: Lagrangian cost λ: a parameter used to control a coding bit rate in an encoding process, D_(q): a distortion induced from residue quantization of said encoding process, R: a number of bits used in coding a macro block. 